Update robots.txt #2632

dsofeir · 2019-08-18T02:23:49Z

I have found that Bing/Yahoo/DuckDuckGo, Yandex and Google report crawl errors when using the default robots.txt. Specifically their bots will not crawl the the path '/' or any sub-paths. I agree that the current robots.txt should work and properly implements the specification. However it still does not work.

In my experience explicitly permitting the path '/' by adding the directive Allow: / resolves the issue.

More details can be found in a blog post about the issue here: https://www.dfoley.ie/blog/starting-with-the-indieweb

I have found that Bing/Yahoo/DuckDuckGo, Yandex and Google report crawl errors when using the default robots.txt. Specifically their bots will not crawl the the path '/' or any sub-paths. I agree that the current robots.txt should work and properly implements the specification. However it still does not work. In my experience explicitly permitting the path '/' by adding the directive Allow: / resolves the issue. More details can be found in a blog post about the issue here: https://www.dfoley.ie/blog/starting-with-the-indieweb

kinger-de · 2019-10-22T07:27:06Z

I have a similar problem with the robots.txt. I have published a page with an image under '/user/pages/01.page/01._module/default.md' and '/user/pages/01.page/01._module/default.jpg'. The image url in the html representation is 'domain.tld/user/pages/01.page/01._module/default.jpg'.

The GoogleBot can crawl all sites without any problems. But he not index the image. I tested the image url with the search console and got the message that the image cant be in the index because it is blocked by the robots.txt. If i test the same url with the Google robots.txt tester everthing looks fine. The rule 'Allow: /user/pages/' is highlighted. And the Response is 200.

I've tested it with the 'Allow: /' rule also. No succed. And i've tested it with the allow-rules before the disallow-rules. Nothing helped.

Every robots.txt tester say the robots.txt is fine, and the image url is not blocked. Except the Google-search console. Any hint how i can get the image in the index?

hughbris · 2020-02-18T22:49:18Z

Does anyone know a way to report these clearly identified errors to these providers? I only know that I've concluded it's futile trying to communicate to Google about their products. Even when they supposedly have channels open, nothing happens, not even an acknowledgement of the message. Not sure about the others.

kinger-de · 2020-04-03T13:28:27Z

I think Google is just listening. Whether and what will be changed in their products will probably be decided elsewhere. As I understand it, the robots.txt tester is not designed to check the indexing of the images. And the feedback from the tool is not generally valid.

For my problem I have now created an extra sitemap for pictures. That helped.

rhukster merged commit ed87faa into getgrav:develop Aug 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update robots.txt #2632

Update robots.txt #2632

dsofeir commented Aug 18, 2019

kinger-de commented Oct 22, 2019 •

edited

Loading

hughbris commented Feb 18, 2020

kinger-de commented Apr 3, 2020

Update robots.txt #2632

Update robots.txt #2632

Conversation

dsofeir commented Aug 18, 2019

kinger-de commented Oct 22, 2019 • edited Loading

hughbris commented Feb 18, 2020

kinger-de commented Apr 3, 2020

kinger-de commented Oct 22, 2019 •

edited

Loading